Description

In this notebook we will explore the explanability of the binary classifier xgboost classifier.

Imports

Load the data

Train-test Split with Stratify

Modelling xgboost classifier

Feature Importances

Model Evaluation: using eli5

eli5: Permutation Importance show weights

eli5: explain weights

eli5: show prediction

Model Evaluation: using PDP

pdp.pdp_isolate(model, dataset, model_features,
feature, num_grid_points=10, 
grid_type='percentile', percentile_range=None,
grid_range=None, cust_grid_points=None, 
memory_limit=0.5, n_jobs=1, predict_kwds=None, 
data_transformer=None)

make sure n_jobs=1 when you are using XGBoost model.

PDP: pdp isolate

pdp.pdp_isolate(model, dataset, model_features,
feature, num_grid_points=10,
grid_type='percentile', percentile_range=None, 
grid_range=None, cust_grid_points=None, 
memory_limit=0.5, n_jobs=1, predict_kwds=None,
data_transformer=None)

check prediction distribution

info_plots.actual_plot(model, X, feature,
feature_name, num_grid_points=10,
grid_type='percentile', percentile_range=None,
grid_range=None, cust_grid_points=None,
show_percentile=False, show_outliers=False,
endpoint=True, which_classes=None,
predict_kwds=None, ncols=2, figsize=None, 
plot_params=None)

Parameters
----------

model: a fitted sklearn model
X: pandas DataFrame
    data set on which the model is trained
which_classes: list, optional, default=None
    which classes to plot, only use when it is a multi-class problem

partial dependence plot (pdp)

Interaction between two variables: bmi and Medical_History_4 with Target

prediction distribution through feature combination of 'BMI' and 'Medical_History_4'

Model Evaluation: plots using SHAP

SHAP = SHapley Additive exPlanations

Get SHAP Values

SHAP: Summary Plot

SHAP: Force Plot

SHAP: Dependence Plot